Structuring Content with XML

نویسنده

  • Erik Wilde
چکیده

XML as the most successful data representation format makes it easy to start working with structured data because of the simplicity of XML documents and DTDs, and because of the general availabilityof tools. This paper first describes the origin and features of XML as a markup language. In a second part, the question of how to use the features provided by XML for structuring content is addressed. Data modeling for electronic publishing and document engineering is an research field with many open issues, the most important open question being what to use as the modeling language for XML-based applications. While the paper does not provide a solution to the modeling language question, it provides guidelines for how to design schemas once the model has been defined.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

خوشه‌بندی فراابتکاری اسناد فارسی اِکس‌اِم‌اِل مبتنی بر شباهت ساختاری و محتوایی

Due to the increasing number of documents, XML, effectively organize these documents in order to retrieve useful information from them is essential. A possible solution is performed on the clustering of XML documents in order to discover knowledge. Clustering XML documents is a key issue of how to measure the similarity between XML documents. Conventional clustering of text documents using a do...

متن کامل

MultiX: an XML based formalism to encode multi- structured documents

This paper concerns the issue of document multi-structuring. For various use objectives, many distinct structures may be defined simultaneously from the same original document. For example, a document may have both a structure for logical content organisation (logical structure), and a structure expressing a set of content formatting rules (physical structure). We have already proposed a generi...

متن کامل

Tools for content-based retrieval and transformation of audio using MPEG-7: the SPOffline and the MDTools

In this paper we present a set of applications for content-based retrieval and transformations of audio recordings. They illustrate diverse aspects of a common framework for music content description and structuring implemented using the MPEG-7 standard. MPEG-7 descriptions can be generated either manually or automatically, and are stored in a XML database. Retrieval services are implemented in...

متن کامل

pdf2table: A Method to Extract Table Information from PDF Files

Tables are a common structuring element in many documents, such as PDF files. To reuse such tables, appropriate methods need to be develop, which capture the structure and the content information. We have developed several heuristics which together recognize and decompose tables in PDF files and store the extracted data in a structured data format (XML) for easier reuse. Additionally, we implem...

متن کامل

Traitements automatiques pour la migration de documents numériques vers XML

More and more companies are migrating their legacy document management systems toward XML format, the industrial standard for data exchange. In order to reduce the migration cost we propose an approach aimed at automating the conversion of layout-oriented documents to semantic-oriented annotations. The conversion module uses supervised machine learning techniques to learn a conversion model for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006